[docs] Improved inpaint docs #5210

stevhliu · 2023-09-27T20:42:56Z

Part of #4758 to update the inpainting guide to be more extensive and in-depth

patrickvonplaten

Nice start, left some comments

docs/source/en/using-diffusers/inpaint.md

sayakpaul · 2023-09-29T09:28:16Z

docs/source/en/using-diffusers/img2img.md

+Running diffusion models is computationally expensive and intensive, but with a few optimization tricks, it is entirely possible to run them on consumer and free-tier GPUs. For example, you can use a more memory-efficient form of attention such as PyTorch 2.0's [scaled-dot product attention](../optimization/torch2.0#scaled-dot-product-attention) or [xFormers](../optimization/xformers) (you can use one or the other, but there's no need to use both). You can also offload the model to the GPU while the other pipeline components wait on the CPU.

 ```diff
 + pipeline.enable_model_cpu_offload()
 + pipeline.enable_xformers_memory_efficient_attention()
 ```

-With [`torch.compile`](optimization/torch2.0#torch.compile), you can boost your inference speed even more by wrapping your UNet with it:
+With [`torch.compile`](../optimization/torch2.0#torch.compile), you can boost your inference speed even more by wrapping your UNet with it:



I feel like we're using this block of text across many guides now. How can we focus more on the objectives of the guides themselves and redirect users for the optimization related details, instead?

(Doesn't need to be addressed in this PR and I am open to brainstorming)

docs/source/en/using-diffusers/inpaint.md

sayakpaul · 2023-09-29T09:45:23Z

docs/source/en/using-diffusers/inpaint.md

+## Optimize
+
+It can be difficult and slow to run diffusion models if you're resource constrained, but it dosen't have to be with a few optimization tricks. One of the biggest (and easiest) optimizations you can enable is switching to memory-efficient attention. If you're using PyTorch 2.0, [scaled-dot product attention](../optimization/torch2.0#scaled-dot-product-attention) is automatically enabled and you don't need to do anything else. For non-PyTorch 2.0 users, you can install and use [xFormers](../optimization/xformers)'s implementation of memory-efficient attention. Both options reduce memory usage and accelerate inference.
+
+You can also offload the model to the GPU to save even more memory:
+
+```diff
+ pipeline.enable_xformers_memory_efficient_attention()
+ pipeline.enable_model_cpu_offload()
+```
+
+To speed-up your inference code even more, use [`torch_compile`](../optimization/torch2.0#torch.compile). You should wrap `torch.compile` around the most intensive component in the pipeline which is typically the UNet:
+
+```py
+pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+```
+
+Learn more in the [Reduce memory usage](../optimization/memory) and [Torch 2.0](../optimization/torch2.0) guides.


Yeah after this section, I definitely think we should think a bit more about how we can organize and distribute these components that are heavily reused across different docs.

(Doesn't have to be addressed in this PR).

sayakpaul

Thank you for the changes!

yiyixuxu

awesome, thank you!

docs/source/en/using-diffusers/inpaint.md

HuggingFaceDocBuilderDev · 2023-10-02T16:32:39Z

The documentation is not available anymore as the PR was closed or merged.

src/diffusers/pipelines/unidiffuser/pipeline_unidiffuser.py

src/diffusers/schedulers/scheduling_unipc_multistep.py

patrickvonplaten

Nice that works for me - it would be great though to make sure that we only change .md files here :-)

* start * finish draft * add section * edits * feedback * make fix-copies * rebase

stevhliu marked this pull request as ready for review September 28, 2023 19:19

stevhliu requested review from patrickvonplaten, sayakpaul and yiyixuxu September 28, 2023 19:19

patrickvonplaten reviewed Sep 29, 2023

View reviewed changes